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Abstract 

This paper is concerned with the well known Jeffreys-Lindley paradox. In a Bayesian set 
up, the so-called paradox arises when a point null hypothesis is tested and an objective prior is 
sought for the alternative hypothesis. In particular, the posterior for the null hypothesis tends 
to one when the uncertainty, i.e. the variance, for the parameter value goes to infinity. We 
argue that the appropriate way to deal with the paradox is to use simple mathematics, and 
that any philosophical argument is to be regarded as irrelevant. 

Some key words: Bayes factor, Bayesian hypothesis testing, Kullback-Leibler divergence, self¬ 
information loss 


1 Introduction 


The literature on the Jeff reys-Lindley pa radox has been prolihc since it was brought to the attention 
of objective Bayesians bv iLindlev (119571 1. Many authors have discussed this so-called paradox from 
varying perspectives; including not only statisticians, but philosophers too. Our aim is to consider 
the problem using simple mathematics. 


Lindlev (jl957l l shows that, for point null hypothesis testing, there may be a concern with the 


objective Bayesian approach. In the specific example used, if the prior for the location parameter, 
in the alternative model to the parameter being zero, has infinite variance, then the Bayesian 
will always select the null model, regardless of the observed data. This was first suggested as a 
warning against using improper priors, but the consequences have now become far reaching with 
a substantial amount of literature written about the observation. 

Let us describe the mathematical setting of the problem. Suppose we wish to test the hypothesis 


H, 


0 • 


= 0 vs Hi : 9 ^ 0 


for the normal model A(x|0,l). Let pQ = P{Mq) be the prior probability assigned to the null 
hypothesis and let tt{6) = A(0|O,(T^), for some u > 0, be the prior distribution for the unknown 
parameter 9 under the alternative model. 

Then the Bayes factor for this problem is given by 


^ X(x|0,l) 

J N{x\9,l)Tr{9) d9^ 


1 
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which represents the odds in favour of the null hypothesis with respect to the alternative. The 
decision on whether one rejects Hq in favour of Hi is based on the posterior probability, given by 


PiMo\x) = 


1 + 


l-P(Mo) 1 

P(Mo) Boi 


-1 


This is the extent of the mathematical foundations to the problem. As Lindley noted, there are 
some combinations of {pojCr) yielding a P{Mq\x) which one would not wish to countenance. 

The natural objective choice for '7r(0) involves taking a = oo. However, rath er than a direct 




plug in of this value, a more general setting has been suggested and considered bv IR obert 
which is to let po depend on a, i.e. we have poicr), so then it is possible to study P{Mq\x) as 
a ^ oo. In this case we can identify three scenarios for P{Mq\x) as ct —>■ oo, all of which have 
associated problems. That is, there is no setting, i.e. choice of po{cr), in which the choice a = oo 
as an objective choice can work. What we mean by this is explained in Section 2. The conclusion 
is that the objective idea of a = oo does not work and consequently the message is not to use 
it. On the other hand, we can set the pair {a < oo,po(a)) objectively using ideas of Type I error 
calculations and a novel approach to the selection of priors for models. 

The lay out of th e pape r is as follows. In Section [2] we formalise the Jeffreys-Lindley paradox 
and discuss iRobert (jl993l l solution to it. Section [3] is dedicated to the our approach, and Section 
I3]is reserved to conclusions and final comments. 


2 Formalisation of the paradox 

In order to discuss approaches to the Jeffreys-Lindley paradox, let us first formalise it and, at the 
same time, define the notation. The objective is to compare the two normal models. 


Mo = |lV(x|0,1) = (27r) ^/^exp(-ix^)| , 

Ml = ^N{x\9, 1) = (27r)“^/^ exp{—— 0)^}| . 


To apply the Bayesian approach, as described in Section [H we need to define both the priors; i.e. 
the value of a, for the unknown parameter 6, and the prior for the null hypothesis; i.e. the value 
of Po- To be most general we will assume that po can depend on a and hence we write it as poicr). 

With this information we can compute the Bayes factor representing the odds of the null 
hypothesis Hq. That is 


Bqi = 


A^(x|0,l) 


1 2 
e“2* 


/iV(x|0,l) •iV(0|O,fT2) d9 
so the posterior probability for the null hypothesis is given by 


g-^a:2/(o-2+l) 


•\/o^”+T, 


PiHolx) = 


1 + 


1-po 1 


1 -1 


Po Bi 


01 


1 + 


I — PqC 2 ^ > 


Po 


i 2 
e H 


+ 1 


-1 


( 1 ) 
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We note in ([T|) that the quantity 


m(a) 


l-po{a) 1 
Po(<7) Vl + 0-2 


is the key term and opens the way to understanding the paradox. We now assume the decision 
maker wants to select cr = oo in order to implement an objective Bayesian approach. To adequately 
understand this procedure we argue the decision maker needs to specify m{a) as a ^ oo, and to 
this end we identify 3 important and exhaustive cases: 


(i) m{a) —>■ 0. Under this scenario we have the undesirable result that P{Hq\x) converges to 
one regardless of the x value. This is the so-called paradoxical result. In fact, m{a) —)• oo 
whenever for large a we have, for any e > 0, 


1 - po{a) 1 

Po('7) Vl + <7^ 
. 1 - Po{(^) 

Po{a) 

Po{cr) 


< £ 


< £ ^Jl + 

^ 1 

1 -|- eVl + 


So if the prior on the null hypothesis is too large as cr oo; i.e. apo{a) oo, then the 
posterior probability on the null hypothesis will converge to 1. 


(ii) m(cr) c for some constant 0 < c < oo. Under this scenario it is that, for large a, 

1 1 


Poicr) = 


1 -h cVl -b 0-2 l + ca 
In particular, Robert ( 1993l i presents an objective argument for 

1 


Po{(^) = 


1 -f v^< 


vrcT 


However, this idea leads to an undesirable inconsistency in that po{cr) —)• 0 yet P{Mq\x) is 
converging to a constant bounded away from 0. Thus, with a = oo, we have P{Mq) = 0 but 
P{Mq\x) 7 ^ 0, which are incoherent choices. 


(iii) m(cr) —>■ oo. Under this scenario we have that P{Mq\x) —0. This at least now becomes 
consistent with the prior probability since Po{(t) —)• 0 in this case. Yet undesirable in that 
with cr = oo, P{Mq\x) = 0. 

These considerations clearly exclude the choice a = oo. It simply does not work. Thus a finite 
choice of cr is required. In the next section we will demonstrate how we can set (cr < oojpoicr)) 
objectively. 


3 An objective choice for (a,po(cr)) 

Given a value of a we first, in Section [3.11 show how to obtain an objective choice for po(o')- Then, 
in Section Ea we show how a < oo can be selected objectively. 
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3.1 The prior po(o') 

Our approach cons ists in measuring the wo rth of the alternative hypothesis with respect to the 


null, as outlined in Villa and Walker ( 20141 1 . In particular, we apply the well known asymptotic 


Bayesian property that, if a model is misspecified, the posterior accu mulates at th e model which 
the nearest, in terms of Kullback-Leibler divergence, to the true model ( Berk . lOOfil l. As such, the 
divergence 1)||V(x|0,1)) represents the loss we would incur if model Mi is removed 

and it is true. Since we do not know 9, but we have the prior vr(0), we can compute the expected 
loss as 


I^Dkl(^N{x\9,1)\\N{x\0,1)]7t{B) dB = / ^9^ tt{ 9) dB = 


1^2 
o ^ 


( 2 ) 


The model prior is determined by means of the self-information loss function (jMerhav and Feder I . 
lOOSl l. which represents the loss connected to a probability statement. For model M, the self¬ 
information loss is given by — logP(M). Therefore, by equating the self-information with the 
expected loss determined in ([2]), we have that the prior on the alternative model is 


1 - pQ{a) oc e'’ 


Note that the prior for the null hypothesis is po{<j) oc 1, and so we have 


Po(o-) 


1 

1 -I- expjicr^} 


This then fits into category (iii) for large cr, which implies that P{Mq\x, a) goes to zero as P{Mq) —>■ 
0. Thus there is coherence in this approach; however, we are not advocating the choice of large a. 


3.2 Determining a 

In any classical test the Type I error is of key importance. We can use this quantity to objectively 
set the value for cr; if indeed the Type I error is an objective quantity, but nevertheless it needs to 
be set, and a valid objective Bayesian criterion is to match classical benchmarks and quantities. 

To determine an appropriate value for cr based on the classical concept of Type I error, we 
would select a so that 

Po(reject Hq) = a, 

where a G (0,1) and Pq is the probability under the null hypothesis. Regardless of the surroundings, 
all Bayesian experimenters in this problem would need to assign an a b value for which one would 
reject Hq if P{Mq\x) < as- To have 


Po P{Mq\x) < crs = cr. 


( 3 ) 
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Figure 1: Plot of log'tp{a), with as = 0.05, for a < 1.3. By setting a = 1.2933 we ensure that 
ip{a) > 0. 


we require 


1 + m(a)exp|ix2^| 
1 9 cr^ 


i.e. exp < -X 


2 1 + ^2 

1 2 
:X 


2 1 + ^2 


< as, 


> 


l/as — I 
m{a) 


> log 


1 


> 


m{a) 

2(1+ u2) 




log 


-1 

m{a) 


Therefore, if we write 


we have m as 


^/;{a) = 


2(1 + (t2) 




log 


m{a) 


Po\x^ > 'il){a) = 2 


1 - 


a 


= a. 


(4) 


(5) 


The key here is that is decreasing as a increases, so there is a one-to-one correspondence 
between a and a satisfying (l5|). Figure [U shows the behaviour of logV'(o'), given as = 0.05. As 
it must be that ?/>(cr) > 0, we compute \ogil){a) up to <7 = 1.2930, which is the value that ensures 
m(cj) < — 1, therefore, a positive 'ipicr). 

Expression (l5|) has to be solved numerically. So, for example, if = a = 0.05, we would have 
a = 0.44. In other words, we can be objective about a with a finite value. The notion therefore 
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that an objective a and u = oo is the only choice is wrong. An objective classical test requires an 
a value and it is this which can be linked to the (finite) objective choice for a. 

4 Discussion 


The findings of this paper can be summarised as follows. The posterior for the point null hypothesis 
is driven by the quantity m{a)] in particular, if u = oo is desired as an objective criterion then 
the behaviour of m{a) as cr —>■ oo is the key. If the prior pQ{a) is fixed, e.g. is equal to_4, then 


the Je ffreys-Lindley paradox arises, since the posterior probability P{Hq\x) goes to one. [Robert I 
( 1993l i proposed to solve the issue by having m{a) to converge to a positive constant. Although 
the direct paradox is avoided, the approach gives an incoherent result as the posterior mass on 
Hq is positive whereas the prior mass is zero. Our approach gives a quantity m{a) which goes to 
infinity, for a going to infinity, which both solves the paradox and yields zero posterior mass for 
Hq when po{a) = 0, implying the prior mass for Hq is zero. 

It is clear that the three types of behaviour of m{a) for large a rule out the possibility of having 
IT = oo. As such, a has to be determined to have a finite value. For po{cr), the choice can be either 
objective or subjective. Our approach allows po{cr) to be deter mined in an object i ve fa shion by 


considering the loss in information if the true model is removed. iDellaportas et al. (j2012l l. on the 


oth er hand, propose a prior for the null hypothesis that is subjective. 


Dellaportas et al. ( 20121 1 focus on models for which the use of a multivariate normal prior is 


appropriate, such as linear regression models, generalised linear models and standard time series 
models. The idea is to set the multiplicative constant for the prior dispersion matrix, Cm, which will 
indicate the level of prior uncertainty. The authors aim to reduce the sensitivity of the posterior 
model probabilities to the scale of the prior by suitably specifying the prior model probability. 
This is done by setting 


P{M) (X P'{M)P^ 


M 1 


where dM is the di mension of the model M and P'{M) is a suitably determined base line prior 


model probability. IDellaportas et al. ((2013) recommend P'{M) oc 1, although other choices are 


possible. We see that the core of the whole approach is to make a prior model probability depen¬ 
dent on the variance of the prior in the parameters, avoiding the Jeffreys-Lindley paradox. 


The conclusion is that it is not possible to be objective for xiO) by setting a = oo. This is 
not the sole case where objective Bayes fails to deliver adoptable solutions. For example, Jeffreys’ 
rule prior f or multidimensional parame ter spaces gives prior distribution with poor performance 


properties (jBernardo and Smith ll994l L It is common practice not to use Jeffreys prior in these 


type of problems and opt for a different solution, such as reference priors. 

However, an objective and finite value of a can be assigned by exploiting thinking behind 
classical tests and setting the Type I error. That is, there is a one-to-one correspondence between 
(T and the Type I error a and it is this correspondence which permits the interpretation and 
assignment of a. 

Surprisingly, or not, there have been philosophical papers attempting to fi nd some hidden 
prof ound explanation behind the paradox; see, for example, the recent papers of Suanos ( 201 jl l 
and Sorenger ( 2013I L We argue that it is not necessary to philosophize, as the mathematics of 
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the problem are quite straightforward and a clear picture of what is happening can be understood 
solely by mathematical considerations.^ _^ 

(l2ni.'ll l says: “The question that generally arises is 


To discuss some of the philosophy, ISuanos 


why the Bayesian and the likelihoodist approaches give rise to the above conflicting and confusing 
results”. However, we have P{Mo\x) < as ^ > tpicr), which is precisely the form of the classical 

test ! 

The classical test is: reject Hq if > Ca, where Pq{x‘^ > Co) = a. We can then set = Ca 
to ensure a standard value for the Type I error. Consequently, Bayes makes no contribution to this 
problem, since even a subjective Bayesian approach will yield a classical test, but with perhaps 
a non-st andard Type I error. Such an observation between Bayesian and classical tests has been 


made by Shivelv and Walker ( 2011ll h 


In short, both Bayesian and classical tests reject Hq if > c, and this is the obvious procedure 
for testing Hq : 0 = 0. How one determines c makes the difference, either via a Type I error, a, or 
via a prior 7r(0), i.e. a, but nevertheless there is a one-to-one correspondence between the two. 
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